Skip to content

Add experimental compaction menu: ClearToolResults, DeduplicateFileReads, TieredCompaction; rename CompactionSummarizingCompaction#191

Merged
dsfaccini merged 8 commits into
mainfrom
capability/compaction
Jun 5, 2026
Merged

Add experimental compaction menu: ClearToolResults, DeduplicateFileReads, TieredCompaction; rename CompactionSummarizingCompaction#191
dsfaccini merged 8 commits into
mainfrom
capability/compaction

Conversation

@DouweM
Copy link
Copy Markdown
Contributor

@DouweM DouweM commented Apr 10, 2026

Adds a menu of context-compaction capabilities, offered as experimental under a new pydantic_ai_harness.experimental namespace.

Experimental gating (new convention)

There was no experimental convention in the harness yet, so this establishes one, mirroring pydantic (which has pydantic.experimental + PydanticExperimentalWarning):

  • The package physically lives at pydantic_ai_harness/experimental/compaction/ — the import path itself signals experimental, and there is no top-level export.
  • Importing a capability emits a HarnessExperimentalWarning. The message hands the user a single, category-wide silence filter — one line mutes every experimental capability, not one per capability:
import warnings
from pydantic_ai_harness.experimental import HarnessExperimentalWarning

warnings.filterwarnings('ignore', category=HarnessExperimentalWarning)

from pydantic_ai_harness.experimental.compaction import TieredCompaction

Importing the experimental package on its own does not warn, so the warning class can be imported to silence first.

Capabilities

Class Cost What it does
SlidingWindow zero-LLM Drop oldest whole messages down to a tail
ClearToolResults zero-LLM Blank old tool results in place, keep last keep_pairs (Anthropic clear_tool_uses)
DeduplicateFileReads zero-LLM Keep only the latest read per file (via a required file_key seam)
SummarizingCompaction one LLM call Structured-section summary of older messages (renamed from Compaction)
TieredCompaction escalates Cheap passes first; summarize only if still over target_tokens
LimitWarner zero-LLM Inject URGENT/CRITICAL warnings as limits approach

Shared CompactionStrategy.compact(messages, ctx) seam (exported) lets TieredCompaction drive tiers directly and re-measure between them — true escalation. All strategies preserve tool-call/return pairing.

Layout

One module per capability under experimental/compaction/, with cross-capability utilities (token estimation, the CompactionStrategy protocol, tool-pair-safe cutoffs, in-place clearing) in _shared.py.

Notes

  • SummarizingCompaction threads ctx.usage into the summary call (tokens and the request fold into the run — honest, consistent, and a runaway summary can't evade a request limit).
  • Per-result truncation was intentionally dropped in favour of a future overflow-to-file capability.

Quality

100% branch coverage; ruff + pyright strict clean; tests use TestModel/mocks only. Capability README.md carries an experimental banner with the import path and silence snippet.

🤖 Generated with Claude Code

DouweM and others added 3 commits April 2, 2026 05:28
Implements three compaction-related capabilities for managing conversation
context in long-running agents:

- SlidingWindow: zero-cost message trimming that preserves tool-call pairs
- LimitWarner: injects warnings when approaching iteration/token limits
- Compaction: LLM-powered summarization of older messages

All three use the before_model_request hook to modify request_context.messages
transparently. The safe cutoff logic ensures tool-call / tool-return pairs are
never orphaned, preventing HTTP 400 errors from LLM providers.

Closes #21

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add explicit `set[str]` type annotations and replace unnecessary
`isinstance` checks with plain `else` branches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion

Implements three improvements from the audit findings on PR #140:

- Optional `tokenizer: Callable[[str], int] | None` parameter on SlidingWindow,
  Compaction, estimate_token_count, and _find_token_cutoff. When provided,
  enables accurate token counting; the 4-chars/token heuristic stays as fallback.

- `preserve_first_user_message: bool = True` on SlidingWindow and Compaction.
  When True, the first ModelRequest containing a UserPromptPart is always
  retained after trimming/compaction, preserving the original task context.

- `incremental: bool = True` on Compaction. When True and a prior compaction
  summary exists in the message history, it is included in the summarization
  prompt via a <previous_summary> tag so the LLM extends it rather than
  regenerating from scratch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@DouweM
Copy link
Copy Markdown
Contributor Author

DouweM commented Apr 10, 2026

Originally posted by @DouweM in #140 comment (PR was recreated)

Note: This PR implements client-side compaction (LLM summarization + sliding window). Provider-side compaction (OpenAI/Anthropic) additionally requires the core primitive in #141 (CompactionPart message type + compact_messages on Model).

@DouweM
Copy link
Copy Markdown
Contributor Author

DouweM commented Apr 10, 2026

Originally posted by @DouweM in #140 comment (PR was recreated)

Audit vs prior art: Compaction

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +726 to +734
system_parts = _extract_system_prompts(messages)
to_summarize = messages[:cutoff]
preserved = messages[cutoff:]

previous_summary = _extract_previous_summary(messages) if self.incremental else None
summary = await self._summarize(to_summarize, previous_summary=previous_summary)

summary_part = SystemPromptPart(content=f'{_SUMMARY_PREFIX}{summary}')
summary_message = ModelRequest(parts=[*system_parts, summary_part])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Old compaction summaries accumulate as SystemPromptParts across multiple compaction cycles

After the first compaction, the summary message contains [SystemPromptPart('original sys prompt'), SystemPromptPart('Summary of previous conversation:\n\n...')]. When a second compaction triggers, _extract_system_prompts(messages) at line 726 extracts ALL leading SystemPromptParts from this message — including the old summary part (since it's also a SystemPromptPart). The old summary is then re-included in the new summary message at line 734 alongside the new summary. After N compactions, the summary message contains N stale summary parts plus the new one, growing the context unboundedly and defeating the purpose of compaction.

Trace through two compaction cycles

After first compaction, result.messages[0] = ModelRequest(parts=[SystemPromptPart('sys'), SystemPromptPart('Summary of previous conversation:\n\nfirst summary')]).

When second compaction triggers, _extract_system_prompts (src/pydantic_harness/compaction.py:594-605) sees both parts are SystemPromptPart, extracts both. Then line 734 creates ModelRequest(parts=[SystemPromptPart('sys'), SystemPromptPart('...first summary'), SystemPromptPart('...second summary')]). The old summary is never removed.

Suggested change
system_parts = _extract_system_prompts(messages)
to_summarize = messages[:cutoff]
preserved = messages[cutoff:]
previous_summary = _extract_previous_summary(messages) if self.incremental else None
summary = await self._summarize(to_summarize, previous_summary=previous_summary)
summary_part = SystemPromptPart(content=f'{_SUMMARY_PREFIX}{summary}')
summary_message = ModelRequest(parts=[*system_parts, summary_part])
system_parts = [
p for p in _extract_system_prompts(messages)
if not p.content.startswith(_SUMMARY_PREFIX)
]
to_summarize = messages[:cutoff]
preserved = messages[cutoff:]
previous_summary = _extract_previous_summary(messages) if self.incremental else None
summary = await self._summarize(to_summarize, previous_summary=previous_summary)
summary_part = SystemPromptPart(content=f'{_SUMMARY_PREFIX}{summary}')
summary_message = ModelRequest(parts=[*system_parts, summary_part])
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread pyproject.toml Outdated

[tool.coverage.report]
fail_under = 100
fail_under = 98
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Coverage threshold lowered from 100% to 98%

The fail_under threshold in pyproject.toml:96 was reduced from 100 to 98, with the commit noting 'due to branch coverage of elif chains'. This permanently lowers the bar for the entire project. Consider using # pragma: no branch on specific elif chains instead of lowering the global threshold.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

# Token estimation
# ---------------------------------------------------------------------------

_CHARS_PER_TOKEN = 4
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will underestimate for Anthropic models unfortunately. It works pretty well for OpenAI ones. I settled on 2.5 in Code Puppy to give a lot of slack (to avoid the errors in Vertex).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this, I suggest we make it configurable somehow? Perhaps an environment var (yucky, but there should be a way to override it for power users)

segments.append(str(part.content))
else:
for part in msg.parts:
if isinstance(part, TextPart):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want to include ThinkingPart?

# Safe cutoff logic — preserves tool-call / tool-return pairs
# ---------------------------------------------------------------------------

_TOOL_PAIR_SEARCH_RANGE = 5
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding this correctly, it could fail if your model performs any number > 5 parallel tool calls.

"""Number of tail messages to preserve after compaction (message-count trigger)."""

keep_tokens: int | None = None
"""Target token budget to preserve after compaction (token-count trigger).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this <3 - I used this strategy in Code Puppy and the agent keeps coherence very nicely. It can get expensive though.

Copy link
Copy Markdown

@mpfaffenberger mpfaffenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Hope they're helpful.

Comment thread src/pydantic_harness/compaction.py Outdated
Comment on lines +647 to +648
model: str
"""Model to use for generating summaries (e.g. ``'openai:gpt-4o-mini'``)."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should likely include KnownModelName and Model, and use infer_model under the hood.

@DouweM DouweM added this to the 2026-05 milestone Apr 23, 2026
@dsfaccini dsfaccini changed the title Add compaction capabilities: SlidingWindow, LimitWarner, Compaction Add compaction capabilities: SlidingWindow, LimitWarner, Compaction Jun 2, 2026
…Compaction → SummarizingCompaction

Extend the compaction menu with the SOTA-aligned strategies missing from the initial
cut, and align naming with the "compaction" umbrella term.

- ClearToolResults: zero-LLM in-place clearing of old tool results (Anthropic
  clear_tool_uses); keep_pairs, exclude_tools, clear_tool_inputs (JSON-valid args),
  min_clear_tokens to protect the prompt cache.
- DeduplicateFileReads: zero-LLM; keep only the latest read per file via a required
  file_key seam.
- TieredCompaction: escalation orchestrator — cheap passes first, summarize only if
  still over target_tokens. CompactionStrategy protocol exported for custom tiers.
- Rename Compaction → SummarizingCompaction; structured-section summary prompt; model
  now optional (inherits the run's model); summary-call usage folded into the parent run.

All strategies preserve tool-call/return pairing. compaction.py at 100% branch
coverage; pyright strict + ruff clean. Adds capability README (compaction.md).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dsfaccini dsfaccini changed the title Add compaction capabilities: SlidingWindow, LimitWarner, Compaction Add compaction menu: ClearToolResults, DeduplicateFileReads, TieredCompaction; rename Compaction → SummarizingCompaction Jun 3, 2026
Resolve the repo restructure (package renamed pydantic_harness → pydantic_ai_harness,
moved to per-capability subpackages). Migrate compaction into the new layout:

- pydantic_ai_harness/compaction/{__init__,_capability}.py + README.md
- root __init__.py exposes the 6 compaction capabilities via lazy __getattr__
- tests moved to tests/compaction/; imports split public vs ._capability
- drop now-unused noqa (main's ruff config does not enforce D102/D105)
- widen _call_args test helper for the new ToolCallPart.args union (ToolSearchArgs)
- pyproject coverage config kept identical to main (protocol stub excluded via pragma)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dsfaccini dsfaccini changed the title Add compaction menu: ClearToolResults, DeduplicateFileReads, TieredCompaction; rename Compaction → SummarizingCompaction Add compaction menu: ClearToolResults, DeduplicateFileReads, TieredCompaction; rename CompactionSummarizingCompaction Jun 3, 2026
The summary call's tokens and request are folded into the run's usage by design —
consistent, cost-honest, and runaway-safe. Make that explicit in the README.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
David SF and others added 2 commits June 3, 2026 22:20
Split the monolithic _capability.py into a module per capability plus a _shared module
for cross-capability utilities (token estimation, the CompactionStrategy protocol,
tool-pair-safe cutoffs, first-user preservation, in-place tool-result clearing).

- _shared.py exposes its package-internal API without leading underscores (the module
  itself is private); genuinely module-local helpers keep their underscore.
- _sliding_window / _clear_tool_results / _deduplicate_file_reads / _limit_warner /
  _summarizing_compaction / _tiered_compaction each own one capability.
- __init__.py re-exports the public API; tests import privates from their new homes.

No behavior change. Every compaction module at 100% branch coverage; ruff + pyright strict clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ental

Compaction is offered as experimental rather than a top-level capability. Since no
experimental convention existed in the harness, this establishes one, mirroring pydantic:

- Physically moved to pydantic_ai_harness/experimental/compaction/ — the import path itself
  signals experimental; there is no top-level export (reverted root __init__).
- Importing a capability emits HarnessExperimentalWarning; its message hands the user a single
  category-wide filter that silences every experimental capability at once (no per-capability
  suppression). Importing the `experimental` package alone does not warn, so the warning class
  can be pulled in to silence first.
- README gains a prominent experimental banner with the import path + silence snippet.
- pytest filterwarnings ignores the category; a dedicated test asserts the message + that one
  filter mutes all experimental warnings.

100% coverage; ruff + pyright strict clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dsfaccini dsfaccini changed the title Add compaction menu: ClearToolResults, DeduplicateFileReads, TieredCompaction; rename CompactionSummarizingCompaction Add experimental compaction menu: ClearToolResults, DeduplicateFileReads, TieredCompaction; rename CompactionSummarizingCompaction Jun 4, 2026
@dsfaccini dsfaccini marked this pull request as ready for review June 5, 2026 00:00
@dsfaccini dsfaccini merged commit 442e9ff into main Jun 5, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants